CSI5387: Data Mining Project
نویسنده
چکیده
Web pages have become more like applications that documents. Not only do they provide dynamic content, they also allow users to play games, send email, and do many other tasks that used to be reserved for traditional applications. One of the major technologies enabling web application creation is JavaScript, which allows execution of code in the browser. Unfortunately, because it is so powerful, JavaScript is also abused by attackers. Often, the only suggested defense against malicious web scripts is to disable JavaScript [2]. Unfortunately, doing so disables much of the functionality of some web pages, and completely breaks others. As such, disabling JavaScript is not a viable option for many people, and few other options exist for protection from malicious web code. Users might gain some protection if it were possible for them to run some JavaScript rather than only having the option of all or none. People might only really need to see menu code, or code that displays videos, and want to discard anything else. However, if we are to apply this sort of protection on existing web pages, there needs to be some way to determine class automatically. For this project, I’ve chosen two classes to distinguish. The first is code used for displaying JavaScript menus, which is probably a fairly low-risk type of JavaScript that many users would wish to run. The second is advertising JavaScript, which is often more complex and may be something users wish to disable. Although it might have been more interesting from a security perspective to use malicious JavaScript as one of my classes, this data would be more difficult to gather. Since at the outset I was not even sure JavaScript could be classified in this manner, it seemed best to use easier-to-obtain training samples. The goal of this project is to learn whether the “bag of words” approach used provides suitable attributes for classification, to explore properties of the resulting feature space, and to determine if reasonable accuracy could be achieved in distinguishing the two classes defined.
منابع مشابه
Automatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining
Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...
متن کاملSimulation of tunnel boring machine utilization: A case study
Utilization is one of the main managerial factors that is applied for construction process analysis well. It directly affects the project duration and construction costs. Therefore, a utilization study in tunneling projects is essential. In this work, the utilization of an earth pressure balance Tunnel Boring Machine (TBM) in Tabriz urban railway project was studied using the Monte Carlo simula...
متن کاملProject Time and Cost Forecasting using Monte Carlo simulation and Artificial Neural Networks
The aim of this study is to present a new method to predict project time and cost under uncertainty. Assuming that what happens in projects implementation which is expressed in the form of Earned Value Management (EVM) indicators is primarily related to the nature of randomness or unreliability, in this study, by using Monte Carlo simulation, and assuming a specific distribution for the time an...
متن کاملDevelopment of an Enhanced Generic Data Mining Life Cycle (DMLC)
Data mining projects are complex and have a high failure rate. In order to improve project management and success rates of such projects a life cycle is vital to the overall success of the project. This paper reports on a research project that was concerned with the life cycle development for large scale data mining projects. The paper provides a detailed view of the design and development of a...
متن کاملTHE Sol-Eu-Net PROJECT DATA MINING LESSONS LEARNED
This paper reports on data mining experiences of the 5th Framework project Data Mining and Decision Support for Business Competitiveness: A European Virtual Enterprise (Sol-Eu-Net). The data mining lessons learned are reported from the following perspectives: application results, business, views of SolEu-Net partners acquired by interview technique, and lessons learned in two particular data mi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008